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Summary  of  Accomplishments 

Computing  systems  for  managing  critical  infrastructures  must  tolerate  fail¬ 
ures  and  be  resistant  to  attack.  This  project  has  explored  techniques  for 
building  such  survivable  critical-infrastructure  systems.  Mechanisms  were 
developed  for  ensuring  integrity  of  hosts  that  execute  mobile  code  and  for 
ensuring  fault-tolerance  of  computations  that  are  structured  in  terms  of  mo¬ 
bile  code.  We  also  explored  automated  techniques  for  analyzing  the  fault- 
tolerance  of  distributed  systems.  And,  finally,  we  initiated  a  research  pro¬ 
gram  into  security  policy  enforcement,  by  both  characterizing  what  policies 
are  enforceable  and  devising  new  object-code  rewriting  methods  for  security 
policy  enforcement. 

A  list  of  the  publications  produced  by  the  project  appears  as  the  final 
section  of  this  report.  Included  among  those  22  publications  are  two  books — 
a  graduate  level  monograph  on  reasoning  about  concurrent  programs  and  a 
now  widely-cited  National  Research  Council  volume  on  information  systems 
trustworthiness.  Also,  two  patents  in  the  area  of  fault-tolerance  were  granted 
to  the  principal  investigator  and  his  industrial  collaborators. 
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Detailed  Description  of  Technical  Progress 

Agent  Integrity 

Agents  comprising  an  application  must  not  only  survive  (possibly  malicious) 
failures  of  the  hosts  they  visit,  but  they  must  also  be  resilient  to  hostile  ac¬ 
tions  by  other  hosts.  Replication  and  voting  enable  an  application  to  survive 
some  failures  of  the  hosts  it  visits.  Hosts  that  are  not  visited  by  agents  of  the 
application,  however,  can  masquerade  and  confound  a  replication  scheme. 
Two  classes  of  protocols  to  solve  these  agent  integrity  problems  were  de¬ 
veloped  as  part  of  this  project  [1,9].  One  class  uses  chained  cryptographic 
certificates;  the  second  class  uses  cryptographic  signature-sharing.  We  were 
then  able  to  unify  these  protocols  by  viewing  them  in  terms  of  delegation. 
In  each,  the  principals  are  sets  of  hosts  (services)  and  authorization  is  trans¬ 
ferred  from  one  principal  to  another. 

In  some  settings,  hosts  being  visited  by  agents  cannot  be  replicated,  so 
the  preceding  protocols  do  not  apply.  This  led  us  to  investigate  protocols  for 
agent  fault-tolerance  without  host  replication.1  With  these  NAP  protocols, 
execution  of  an  agent  A  on  a  host  is  monitored  by  agents  (napping)  on  other 
hosts  [20].  If  the  failure  of  A  or  of  the  host  on  which  A  executes  is  detected, 
then  one  of  the  napping  agents  performs  a  recovery  action.  This  recovery 
action  might  involve  retrying  A,  dispatching  a  different  agent  to  some  other 
host,  or  alerting  the  computation’s  initiator  of  a  problem.  NAP  is  not  resilient 
to  hostile  host  failures,  but  without  using  replication  no  scheme  can  be. 

The  difficult  part  of  implementing  NAP  involves  coordinating  the  napping 
agents.  A  protocol  that  tolerates  multiple  failures  must  have  multiple  agents 
napping,  each  monitoring  execution.  A  coordination  protocol  is  required 
to  ensure  that  more  than  one  napping  agents  does  not  detect  and  try  to 
restart  a  failed  agent.  Our  initial  solutions  to  the  coordination  problem  were 
complex  enough  that  their  correctness  was  suspect.  This  led  us  to  show  that 
the  problem  was  actually  an  instance  of  the  (fail-stop)  reliable  broadcast 
problem  that  we  solved  in  1983.  And,  by  refining  our  1983  protocol,  we 
were  able  to  support  a  broad  class  of  strategies  for  how  napping  agents  are 
disbursed  in  the  network.  This  broader  class  of  strategies  allows  our  protocols 
also  to  work  when  the  trajectory  of  an  agent  folds  back  on  itself,  visiting  a 
host  that  is  still  running  a  napping  agent. 

This  work  is  joint  with  Dag  Johansen  at  the  University  of  Tromsoe  (Norway)  and 
Keith  Marzullo  at  the  Univ  of  California,  San  Diego. 
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Analysis  of  system  fault-tolerance 

Ad  hoc  reasoning  about  fault  tolerance  is  unsatisfactory  for  large,  critical- 
infrastructure  systems.  Only  rigorous  analysis  with  mechanized  support  can 
give  the  needed  confidence;  only  a  tool  that  is  usable  by  system  designers  can 
have  a  real  impact.  Therefore,  we  continued  our  investigations  (jointly  with 
Scott  Stoller)  into  a  new  verification  framework  that  is  specialized  to  fault  tol¬ 
erance  [4,13].  The  framework,  which  is  based  on  a  stream-processing  model  of 
computation,  permits  more  natural  specifications  of  fault-tolerance  require¬ 
ments  than  general-purpose  formalisms  and  supports  mechanized  analysis  of 
system  fault-tolerance. 

In  stream-processing  models,  each  component  of  a  system  is  represented 
by  an  input- output  function  describing  its  behavior.  For  simplicity,  processes 
are  assumed  to  communicate  only  by  messages  transmitted  along  unbounded 
FIFO  channels.  Behaviors  of  a  system  can  be  determined  from  input-output 
functions  describing  the  system’s  components  by  doing  a  fixed-point  calcu¬ 
lation.  This  provides  a  clean  algorithmic  basis  for  our  analysis.  Each  input- 
output  function  encapsulates  the  implementation  of  a  component,  enabling 
a  convenient  separation  of  local  and  global  analyses.  Local  analysis  verifies 
independently  for  each  component  that  the  proposed  input-output  function 
faithfully  represents  its  behavior.  Global  analysis,  in  the  form  of  the  fixed- 
point  calculation,  determines  the  system’s  behavior  from  the  input-output 
functions. 

The  fixed-point  calculation  produces  a  graph,  called  a  message  flow  graph, 
representing  possible  communication  behaviors  of  the  system.  Each  node  of 
the  graph  corresponds  to  a  component,  and  each  edge  is  labeled  with  a 
description  of  the  sequence  of  messages  sent  from  the  source  node  to  the 
target  node.  Exact  computation  of  all  possible  sequences  of  messages  that 
might  be  sent  is  generally  infeasible.  So,  to  help  make  automated  analysis 
feasible,  our  framework  supports  flexible  and  powerful  approximations,  or 
abstractions,  as  they  are  called  in  the  literature  on  abstract  interpretation. 
Traditionally,  stream-processing  models  have  not  incorporated  approxima¬ 
tions.  The  approximations  in  our  framework  enable  compact  representation 
of  the  highly  non-deterministic  behavior  characteristic  of  severe  failures  and 
also  support  abstraction  from  irrelevant  aspects  of  a  system’s  failure-free  be¬ 
havior.  The  latter  reflects  a  separation  of  concerns  that  is  crucial  for  making 
the  fault-tolerance  analysis  tractable. 

We  use  only  conservative  approximations,  so  the  analysis  never  falsely 
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implies  that  a  system  satisfies  its  fault-tolerance  requirement.  But  approxi¬ 
mations  do  introduce  the  possibility  of  false  negatives:  the  analysis  might  not 
establish  that  a  system  satisfies  its  fault-tolerance  requirement,  even  though 
it  does. 

A  common  approach  to  modeling  failures  is  to  treat  them  as  events  that 
occur  non-deterministically  during  a  computation,  thereby  making  it  diffi¬ 
cult  to  separate  the  effects  of  failures  from  other  aspects  of  the  system’s 
behavior  and,  consequently,  to  model  the  former  more  finely  than  the  lat¬ 
ter.  In  particular,  one  often  wants  to  avoid  case  analysis  corresponding  to 
non-determinism  in  a  system’s  failure-free  behavior,  while  case  analysis  corre¬ 
sponding  to  different  combinations  of  failures  appears  unavoidable  in  general 
in  automated  analysis  of  fault-tolerance.  A  failure  scenario  for  a  system  is  an 
assignment  of  component  failures  to  a  subset  of  the  system’s  components.  In 
our  approach,  each  input-output  function  is  parameterized  by  possible  fail¬ 
ures  in  the  corresponding  component;  system  behavior  is  analyzed  separately 
for  each  failure  scenario  of  interest. 

In  our  framework,  possible  communications  (in  a  given  failure  scenario) 
between  two  components  are  characterized  by  approximations  of  values  (the 
data  transmitted  in  messages),  multiplicities  (the  number  of  times  each  value 
is  sent),  and  message  orderings  (the  order  in  which  values  are  sent).  Values 
and  multiplicities  are  approximated  using  a  form  of  abstract  interpretation 
and  a  form  of  symbolic  computation.  Message  orderings  are  approximated 
using  partial  (instead  of  total)  orders. 

Our  analysis  method  was  implemented  in  a  prototype  tool  called  CRAFT. 
And  we  have  used  CRAFT  to  analyze  our  protocols  for  agent  integrity  and 
the  Oral  Messages  algorithm  for  Byzantine  Agreement. 

Enforceable  Security  Policies 

A  security  policy  defines  executions  that,  for  one  reason  or  another,  have  been 
deemed  unacceptable.  To  date,  application-independent  security  policies — 
like  mandatory  and  discretionary  access  control,  information  flow  restrictions, 
and  resource  availability — have  attracted  most  of  the  attention.  But  with 
the  expanding  role  of  computers  in  our  infrastructure,  specialized,  appli¬ 
cation-dependent  security  policies  are  becoming  increasingly  important.  For 
example,  a  system  to  support  mobile  code  might  prevent  information  leakage 
by  enforcing  a  security  policy  that  bars  messages  from  being  sent  after  files 
are  read.  To  support  electronic  commerce,  a  security  policy  might  prohibit 
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executions  in  which  a  customer  pays  for  a  service  but  the  seller  does  not 
provide  that  service. 

Over  the  period  of  this  grant,  we  developed  a  mathematical  character¬ 
ization  of  what  security  policies  are  enforceable  [9].  First,  we  proved  that 
enforcement  mechanisms  cannot  exist  for  security  policies  that  are  not  safety 
properties.  Second,  we  developed  a  new  class  of  enforcement  mechanisms 
and  proved  that  it  is  complete  for  the  set  of  all  enforceable  security  policies 
[22].  Our  new  class  of  mechanisms  is  based  on  security  automata,  automata 
that  accept  finite  and  infinite  sequences. 

A  security  automaton  serves  as  an  enforcement  mechanism  for  some  target 
system  by  monitoring  and  controlling  the  execution  of  that  system.  Each 
action  or  new  state  corresponding  to  a  next  step  that  the  target  system 
takes  is  sent  to  the  security  automaton  and  serves  as  the  next  symbol  of 
that  automaton’s  input.  If  the  automaton  cannot  make  a  transition  on  an 
input  symbol,  then  the  target  system  is  about  to  violate  the  security  policy 
specified  by  the  automation,  and  the  target  system  is  terminated. 

We  demonstrated  the  practicality  of  enforcing  security  policies  expressed 
using  security  automata  by  constructing  and  evaluating  tools  to  generate 
inlined,  reference  monitors  that  implement  security  automata  for  both  the 
Java  Virtual  Machine  and  Intel  x86  machines.  The  first  prototype  (SASI) 
worked  for  programs  written  or  compiled  into  Java  virtual  machine  code 
(JVML)  or  Intel’s  x86  machine  code;  a  second  generation  (PoET/PSLang) 
refined  the  approach  for  JVML.  Specifically,  given  a  security  automaton  SA 
that  expresses  a  security  policy  and  given  a  machine  language  program  P, 
both  SASI  and  PoET/PSLang  add  checks  to  P  that  are  necessary  in  order 
to  ensure  that  executing  P  is  guaranteed  not  to  violate  the  security  policy 
defined  by  SA.  In  addition,  using  standard  compiler  analyses,  our  prototypes 
attempt  to  minimize  the  number  of  checks  inserted. 

Using  SASI,  we  experimented  with  generalizations  of  two  well  known  se¬ 
curity  policies:  software  fault  isolation  (SFI)  and  the  Java  Standard  Security 
Manager.  Our  experiments  confirmed  that  SASI  generates  code  comparable 
with  hand-coded,  heavily  optimized  SFI  tools  for  the  x86,  and  in  fact  ex¬ 
ceeds  the  performance  of  the  hand-coded  Java  Standard  Security  Manager. 
Furthermore,  security  automaton  specifications  of  the  security  policies  have 
proven  to  be  easy  to  write,  understand,  and  modify.  Using  PoET/PSLang, 
we  showed  how  to  support  the  Java  2  “stack  inspection”  security  policy  with¬ 
out  any  support  from  the  Java  virtual  machine.  This,  for  example,  allows 
Java  2  programs  to  be  executed  on  previous  generations  of  the  Java  rim-time 
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system;  it  also  allows  deployment  of  variations  and  refinements  of  the  Java 
security  policy. 
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